Collective Ontology-based Information Extraction using Probabilistic Graphical Models
نویسنده
چکیده
Information Extraction (IE) is a process of extracting structured data from unstructured sources. It roughly consists of subtasks named entity recognition, relation extraction and coreference resolution. Researchers have primarily focused just on one subtask or their combination in a pipeline. In this paper we introduce an intelligent collective IE system combining all three subtasks by employing conditional random fields. The usage of same learning model enables us to easily communicate between iterations on the fly and to correct errors during iterative process execution. In addition to the architecture we introduce novel semantic and collective feature functions. The system’s output is labelled according to an ontology and new instances are automatically created during runtime. The ontology as a schema encodes a set of constraints, defines optional manual rules or patterns and with instances provides semantic gazetteer lists. The proposed framework is being developed during ongoing PhD research. It’s main contributions are intelligent iterative interconnection of the selected subtasks, extensive use of context-specific features and parameterless system that can be guided by an ontology. Some preliminary results combining just two subtasks already show promising results over traditional approaches.
منابع مشابه
Proceedings Template - WORD
Most of the approaches for dealing with uncertainty in the Semantic Web rely on the principle that this uncertainty is already asserted. In this paper, we propose a new approach to learn and reason about uncertainty in the Semantic Web. Using instance data, we learn the uncertainty of an OWL ontology, and use that information to perform probabilistic reasoning on it. For this purpose, we use Ma...
متن کاملRule-based joint fuzzy and probabilistic networks
One of the important challenges in Graphical models is the problem of dealing with the uncertainties in the problem. Among graphical networks, fuzzy cognitive map is only capable of modeling fuzzy uncertainty and the Bayesian network is only capable of modeling probabilistic uncertainty. In many real issues, we are faced with both fuzzy and probabilistic uncertainties. In these cases, the propo...
متن کاملRelational Markov Networks for Collective Information Extraction
Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We pres...
متن کاملCollective Information Extraction with Relational Markov Networks
Most information extraction (IE) systems treat separate potential extractions as independent. However, in many cases, considering influences between different potential extractions could improve overall accuracy. Statistical methods based on undirected graphical models, such as conditional random fields (CRFs), have been shown to be an effective approach to learning accurate IE systems. We pres...
متن کاملA Note on Methodology for Designing Ontology Management Systems
In this note we propose a novel methodology for designing Ontology Management Systems architecture, which grounds on an ontology representation based on probabilistic Graphical Models. By discussing about troubles with ontology as tool for managing knowledge, formal assumptions about semantics definition and representations rise, turning out an original architecture that will be presented and dis-
متن کامل